Overview

Dataset statistics

Number of variables18
Number of observations40690
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.6 MiB
Average record size in memory144.0 B

Variable types

NUM8
CAT6
BOOL4

Reproduction

Analysis started2020-06-26 15:35:29.775553
Analysis finished2020-06-26 15:35:42.172066
Duration12.4 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

previous is highly skewed (γ1 = 43.46842768) Skewed
df_index has unique values Unique
balance has 3139 (7.7%) zeros Zeros
previous has 33279 (81.8%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct count40690
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22593.702924551486
Minimum0
Maximum45210
Zeros1
Zeros (%)< 0.1%
Memory size317.9 KiB

Quantile statistics

Minimum0
5-th percentile2266.45
Q111257.5
median22562.5
Q333929.75
95-th percentile42961.55
Maximum45210
Range45210
Interquartile range (IQR)22672.25

Descriptive statistics

Standard deviation13064.34245
Coefficient of variation (CV)0.5782293631
Kurtosis-1.204288374
Mean22593.70292
Median Absolute Deviation (MAD)11337.5
Skewness0.003007341099
Sum919337772
Variance170677043.7
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
33391< 0.1%
 
340741< 0.1%
 
402171< 0.1%
 
381681< 0.1%
 
115351< 0.1%
 
94861< 0.1%
 
156291< 0.1%
 
135801< 0.1%
 
12901< 0.1%
 
Other values (40680)40680> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
452101< 0.1%
 
452081< 0.1%
 
452071< 0.1%
 
452061< 0.1%
 
452051< 0.1%
 

age
Real number (ℝ≥0)

Distinct count77
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.90540673384124
Minimum18
Maximum95
Zeros0
Zeros (%)0.0%
Memory size317.9 KiB

Quantile statistics

Minimum18
5-th percentile27
Q133
median39
Q348
95-th percentile59
Maximum95
Range77
Interquartile range (IQR)15

Descriptive statistics

Standard deviation10.60490825
Coefficient of variation (CV)0.2592544384
Kurtosis0.3231613161
Mean40.90540673
Median Absolute Deviation (MAD)7
Skewness0.6834969594
Sum1664441
Variance112.464079
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3218694.6%
 
3117964.4%
 
3317754.4%
 
3417254.2%
 
3517214.2%
 
3616094.0%
 
3015743.9%
 
3715023.7%
 
3913323.3%
 
3813303.3%
 
Other values (67)2445760.1%
 
ValueCountFrequency (%) 
1812< 0.1%
 
19300.1%
 
20450.1%
 
21700.2%
 
221210.3%
 
ValueCountFrequency (%) 
952< 0.1%
 
941< 0.1%
 
932< 0.1%
 
922< 0.1%
 
902< 0.1%
 

job
Categorical

Distinct count12
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size317.9 KiB
blue-collar
8769
management
8504
technician
6818
admin.
4661
services
3725
Other values (7)
8213
ValueCountFrequency (%) 
blue-collar876921.6%
 
management850420.9%
 
technician681816.8%
 
admin.466111.5%
 
services37259.2%
 
retired20275.0%
 
self-employed14273.5%
 
entrepreneur13393.3%
 
unemployed11932.9%
 
housemaid11252.8%
 
Other values (2)11022.7%
 

Length

Max length13
Median length10
Mean length9.486900958
Min length6

marital
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size317.9 KiB
married
24464
single
11531
divorced
 
4695
ValueCountFrequency (%) 
married2446460.1%
 
single1153128.3%
 
divorced469511.5%
 

Length

Max length8
Median length7
Mean length6.831998034
Min length6

education
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size317.9 KiB
secondary
20951
tertiary
11917
primary
6153
unknown
 
1669
ValueCountFrequency (%) 
secondary2095151.5%
 
tertiary1191729.3%
 
primary615315.1%
 
unknown16694.1%
 

Length

Max length9
Median length9
Mean length8.32265913
Min length7

default
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size317.9 KiB
no
39965
yes
 
725
ValueCountFrequency (%) 
no3996598.2%
 
yes7251.8%
 

balance
Real number (ℝ)

ZEROS

Distinct count6903
Unique (%)17.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1359.6975178176456
Minimum-8019
Maximum102127
Zeros3139
Zeros (%)7.7%
Memory size317.9 KiB

Quantile statistics

Minimum-8019
5-th percentile-173
Q174
median451
Q31423
95-th percentile5745.55
Maximum102127
Range110146
Interquartile range (IQR)1349

Descriptive statistics

Standard deviation3034.248783
Coefficient of variation (CV)2.231561611
Kurtosis142.8032515
Mean1359.697518
Median Absolute Deviation (MAD)451
Skewness8.410197358
Sum55326092
Variance9206665.678
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
031397.7%
 
11690.4%
 
21470.4%
 
41230.3%
 
31210.3%
 
51020.3%
 
6790.2%
 
8740.2%
 
23680.2%
 
7640.2%
 
Other values (6893)3660490.0%
 
ValueCountFrequency (%) 
-80191< 0.1%
 
-68471< 0.1%
 
-40571< 0.1%
 
-33721< 0.1%
 
-33131< 0.1%
 
ValueCountFrequency (%) 
1021271< 0.1%
 
984171< 0.1%
 
812041< 0.1%
 
711881< 0.1%
 
667211< 0.1%
 

housing
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size317.9 KiB
yes
22661
no
18029
ValueCountFrequency (%) 
yes2266155.7%
 
no1802944.3%
 

loan
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size317.9 KiB
no
34177
yes
 
6513
ValueCountFrequency (%) 
no3417784.0%
 
yes651316.0%
 

contact
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size317.9 KiB
cellular
26319
unknown
11771
telephone
 
2600
ValueCountFrequency (%) 
cellular2631964.7%
 
unknown1177128.9%
 
telephone26006.4%
 

Length

Max length9
Median length8
Mean length7.774612927
Min length7

day
Real number (ℝ≥0)

Distinct count31
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.808405013516834
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Memory size317.9 KiB

Quantile statistics

Minimum1
5-th percentile3
Q18
median16
Q321
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)13

Descriptive statistics

Standard deviation8.318280773
Coefficient of variation (CV)0.5261935512
Kurtosis-1.058766298
Mean15.80840501
Median Absolute Deviation (MAD)7
Skewness0.0932807123
Sum643244
Variance69.19379502
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2024676.1%
 
1820905.1%
 
2118204.5%
 
1717644.3%
 
617384.3%
 
517284.2%
 
1416464.0%
 
816444.0%
 
2816434.0%
 
716364.0%
 
Other values (21)2251455.3%
 
ValueCountFrequency (%) 
12870.7%
 
211512.8%
 
39752.4%
 
413003.2%
 
517284.2%
 
ValueCountFrequency (%) 
315831.4%
 
3014073.5%
 
2915663.8%
 
2816434.0%
 
2710042.5%
 

month
Categorical

Distinct count12
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size317.9 KiB
may
12413
jul
6214
aug
5606
jun
4848
nov
3535
Other values (7)
8074
ValueCountFrequency (%) 
may1241330.5%
 
jul621415.3%
 
aug560613.8%
 
jun484811.9%
 
nov35358.7%
 
apr26466.5%
 
feb23635.8%
 
jan12683.1%
 
oct6601.6%
 
sep5201.3%
 
Other values (2)6171.5%
 

Length

Max length3
Median length3
Mean length3
Min length3

duration
Real number (ℝ≥0)

Distinct count1530
Unique (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean258.2438436962399
Minimum0
Maximum4918
Zeros3
Zeros (%)< 0.1%
Memory size317.9 KiB

Quantile statistics

Minimum0
5-th percentile35
Q1103
median180
Q3319
95-th percentile752.55
Maximum4918
Range4918
Interquartile range (IQR)216

Descriptive statistics

Standard deviation257.5770676
Coefficient of variation (CV)0.9974180368
Kurtosis18.14829807
Mean258.2438437
Median Absolute Deviation (MAD)93
Skewness3.138967323
Sum10507942
Variance66345.94575
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1241690.4%
 
901650.4%
 
891600.4%
 
1361590.4%
 
1141580.4%
 
1221580.4%
 
1391580.4%
 
1121580.4%
 
1041570.4%
 
1131560.4%
 
Other values (1520)3909296.1%
 
ValueCountFrequency (%) 
03< 0.1%
 
12< 0.1%
 
23< 0.1%
 
34< 0.1%
 
414< 0.1%
 
ValueCountFrequency (%) 
49181< 0.1%
 
38811< 0.1%
 
37851< 0.1%
 
33661< 0.1%
 
33221< 0.1%
 

campaign
Real number (ℝ≥0)

Distinct count47
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.764585893339887
Minimum1
Maximum63
Zeros0
Zeros (%)0.0%
Memory size317.9 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median2
Q33
95-th percentile8
Maximum63
Range62
Interquartile range (IQR)2

Descriptive statistics

Standard deviation3.110157616
Coefficient of variation (CV)1.124999452
Kurtosis39.85929663
Mean2.764585893
Median Absolute Deviation (MAD)1
Skewness4.931344977
Sum112491
Variance9.673080397
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
11581738.9%
 
21121827.6%
 
3500312.3%
 
431617.8%
 
515893.9%
 
611402.8%
 
76461.6%
 
84961.2%
 
92960.7%
 
102470.6%
 
Other values (37)10772.6%
 
ValueCountFrequency (%) 
11581738.9%
 
21121827.6%
 
3500312.3%
 
431617.8%
 
515893.9%
 
ValueCountFrequency (%) 
631< 0.1%
 
581< 0.1%
 
551< 0.1%
 
511< 0.1%
 
502< 0.1%
 

pdays
Real number (ℝ)

Distinct count548
Unique (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.05986728926026
Minimum-1
Maximum871
Zeros0
Zeros (%)0.0%
Memory size317.9 KiB

Quantile statistics

Minimum-1
5-th percentile-1
Q1-1
median-1
Q3-1
95-th percentile317
Maximum871
Range872
Interquartile range (IQR)0

Descriptive statistics

Standard deviation100.0782815
Coefficient of variation (CV)2.498217998
Kurtosis7.047275272
Mean40.05986729
Median Absolute Deviation (MAD)0
Skewness2.631138312
Sum1630036
Variance10015.66242
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-13327981.8%
 
1821530.4%
 
921320.3%
 
1831120.3%
 
911120.3%
 
181990.2%
 
370850.2%
 
184740.2%
 
95680.2%
 
364660.2%
 
Other values (538)651016.0%
 
ValueCountFrequency (%) 
-13327981.8%
 
115< 0.1%
 
2320.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
8711< 0.1%
 
8541< 0.1%
 
8501< 0.1%
 
8421< 0.1%
 
8381< 0.1%
 

previous
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count41
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5794052592774638
Minimum0
Maximum275
Zeros33279
Zeros (%)81.8%
Memory size317.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile3
Maximum275
Range275
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.350663681
Coefficient of variation (CV)4.057028553
Kurtosis4615.65138
Mean0.5794052593
Median Absolute Deviation (MAD)0
Skewness43.46842768
Sum23576
Variance5.52561974
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
03327981.8%
 
124906.1%
 
218864.6%
 
310322.5%
 
46391.6%
 
54081.0%
 
62550.6%
 
71900.5%
 
81120.3%
 
9810.2%
 
Other values (31)3180.8%
 
ValueCountFrequency (%) 
03327981.8%
 
124906.1%
 
218864.6%
 
310322.5%
 
46391.6%
 
ValueCountFrequency (%) 
2751< 0.1%
 
581< 0.1%
 
551< 0.1%
 
511< 0.1%
 
411< 0.1%
 

poutcome
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size317.9 KiB
unknown
33284
failure
 
4377
other
 
1648
success
 
1381
ValueCountFrequency (%) 
unknown3328481.8%
 
failure437710.8%
 
other16484.1%
 
success13813.4%
 

Length

Max length7
Median length7
Mean length6.918997297
Min length5

y
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size317.9 KiB
no
35903
yes
 
4787
ValueCountFrequency (%) 
no3590388.2%
 
yes478711.8%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

df_indexagejobmaritaleducationdefaultbalancehousingloancontactdaymonthdurationcampaignpdayspreviouspoutcomey
0058managementmarriedtertiaryno2143yesnounknown5may2611-10unknownno
1144techniciansinglesecondaryno29yesnounknown5may1511-10unknownno
2233entrepreneurmarriedsecondaryno2yesyesunknown5may761-10unknownno
3347blue-collarmarriedunknownno1506yesnounknown5may921-10unknownno
4433unknownsingleunknownno1nonounknown5may1981-10unknownno
5535managementmarriedtertiaryno231yesnounknown5may1391-10unknownno
6742entrepreneurdivorcedtertiaryyes2yesnounknown5may3801-10unknownno
7858retiredmarriedprimaryno121yesnounknown5may501-10unknownno
81041admin.divorcedsecondaryno270yesnounknown5may2221-10unknownno
91129admin.singlesecondaryno390yesnounknown5may1371-10unknownno

Last rows

df_indexagejobmaritaleducationdefaultbalancehousingloancontactdaymonthdurationcampaignpdayspreviouspoutcomey
406804520038technicianmarriedsecondaryno557yesnocellular16nov15564-10unknownyes
406814520153managementmarriedtertiaryno583nonocellular17nov22611844successyes
406824520234admin.singlesecondaryno557nonocellular17nov2241-10unknownyes
406834520323studentsingletertiaryno113nonocellular17nov2661-10unknownyes
406844520473retiredmarriedsecondaryno2850nonocellular17nov3001408failureyes
406854520525techniciansinglesecondaryno505noyescellular17nov3862-10unknownyes
406864520651technicianmarriedtertiaryno825nonocellular17nov9773-10unknownyes
406874520771retireddivorcedprimaryno1729nonocellular17nov4562-10unknownyes
406884520872retiredmarriedsecondaryno5715nonocellular17nov112751843successyes
406894521037entrepreneurmarriedsecondaryno2971nonocellular17nov361218811otherno